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Abstract 

This article reviews quantitative methods to estimate the basic reproduction 
number of pandemic influenza, a key threshold quantity to help determine the 
intensity of interventions required to control the disease. Although it is difficult 
to assess the transmission potential of a probable future pandemic, historical epi- 
demiologic data is readily available from previous pandemics, and as a reference 
quantity for future pandemic planning, mathematical and statistical analyses of 
historical data are crucial. In particular, because many historical records tend 
to document only the temporal distribution of cases or deaths (i.e. epidemic 
curve), our review focuses on methods to maximize the utility of time-evolution 
data and to clarify the detailed mechanisms of the spread of influenza. 

First, we highlight structured epidemic models and their parameter estima- 
tion method which can quantify the detailed disease dynamics including those 
we cannot observe directly. Duration-structured epidemic systems are subse- 
quently presented, offering firm understanding of the definition of the basic and 
effective reproduction numbers. When the initial growth phase of an epidemic is 
investigated, the distribution of the generation time is key statistical information 
to appropriately estimate the transmission potential using the intrinsic growth 
rate. Applications of stochastic processes are also highlighted to estimate the 
transmission potential using similar data. Critically important characteristics 
of influenza data are subsequently summarized, followed by our conclusions to 
suggest potential future methodological improvements. 
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1 Introduction 

Influenza epidemics are observed around the world during the wintertime and with a 
strong seasonal component in temperate regions [T][2j. Influenza is a disease caused 
by the influenza virus, an RNA virus belonging to the Orthomyxoviridae [3]. Many 
features are common with those of the paramyxovirus infections of the acute upper 
respiratory tract. Typical symptoms of the disease are characterized by fever, myal- 
gia, severe malaise, non-productive cough, and sore throats. The disease spreads when 
an infected individual coughs or sneezes and sends the virus into the air, and other 
susceptible individuals inhale the virus. The virus is also believed to be transmitted 
when a person touches a surface that is contaminated with the virus {e.g. door knob, 
etc.) and then touches the nose or eyes. Infected individuals can transmit the virus 
almost within a day following infection [i.e. latent period). Although it is generally 
believed that infected individuals can pass the virus for 3-7 days following symptom on- 
set, there is some uncertainty on the duration of the infectious period. The generation 
time [i.e. sum of latent and infectious periods) for influenza, reported and assumed in 
the literature, ranges from 3 days [1][5][6] to about 6 days [Z][8]. 

Individuals that are infected with influenza are believed to become permanently 
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immune against the specific virus strain. Hence, the virus is able to persist in the human 
population through relatively minor (single point) mutations in the virus composition 
known as drifts. Influenza (sub)types A/H3N2, A/HINI and B are currently co- 
circulating in the human population [9j. Major changes in the virus composition via 
recombination or gene reassortment processes (known as genetic shifts) can lead to 
the emergence of novel influenza viruses with the potential of generating dramatic 
morbidity and mortality levels around the world 

The 1918-19 influenza pandemic known as the Spanish influenza caused by the in- 
fluenza virus A (HlNl) has been the most devastating in recent history with estimated 
worldwide mortality ranging from 20 to 100 million deaths [TU] [H] with a case fatality 
of 2-6 percent [12] [T3]. The worldwide 1918 influenza pandemic spread in three waves 
starting from Midwestern United States in the spring of 1918 [llj[15]. The deadly 
second wave began in late August probably in France while the third wave is generally 
considered as part of normal more scattered winter outbreaks similar to those observed 
after the 1889/90 pandemic [11]. Subsequent pandemics during the 20th century are 
attributed to subtyes A (H2N2) from 1957-58 (Asian influenza) and A (H3N2) in 1968 
(Hong Kong influenza) [16|. 

The ability to quickly detect and institute control efforts at the early stage of an 
influenza pandemic is directly linked to the flnal levels of morbidity and mortality in 
the population [13]. To appropriately assess the disaster size of a probable future pan- 
demic, we have to quantify the transmission potential (and its associated uncertainty). 
Although it is difficult to directly measure the transmissibility of a future pandemic. 
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historical epidemiologic data is readily available from previous pandemics, and as a 
reference quantity for future pandemic planning, mathematical and statistical analyses 
of the historical data can offer various insights. In particular, because many historical 
records tend to document only the temporal distribution of cases or deaths {i.e. epi- 
demic curve), we modelers have faced with a difficult need to clarify the mechansms of 
the spread of influenza using such time-evolution data alone. In this paper, we review 
a number of mathematical and statistical methods for the estimation of the transmis- 
sion potential of pandemic influenza, focusing on theoretical techniques to maximize 
the utility of the temporal distribution of influenza cases. The methods that have 
been incorporated in this review include the applications of epidemiologically struc- 
tured epidemic models, explicitly duration-structured epidemic system, and stochastic 
processes {i.e. branching and counting processes). Whereas this review does not cover 
the spread of influenza in space, spatial heterogeneity in transmission and the growing 
interest in the role of contact network are briefly discussed as the future challenge. 



2 On the definition of the transmission potential 

The basic reproduction number, Rq (pronounced as R nought), is a key quantity used 
to estimate transmissibility of infectious diseases. Theoretically, Rq is defined as the 
average number of secondary cases generated by a single primary case during its entire 
period of infectiousness in a completely susceptible population [17]. As the epidemic 
progresses, the number of susceptible individuals is decreased due to infection, and the 
reproduction number decays following R{t) = RoS{t)/ S{0) where S{t) and S{0) are, 
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respectively, the number of susceptible individuals at time t and before the epidemic 
starts; the latter is equivalent to the total population size given that all individuals 
are susceptible before the beginning of an epidemic. Clearly, this definition only applies 
to (novel) emerging infectious diseases {e.g. the epidemic of severe acute respiratory 
syndrome (SARS) from 2002-3) or re-emerging infectious diseases that had not circu- 
lated in the population in question for long enough to allow for residual immunity in 
the population to disappear due to births and deaths. 

The reproduction number is directly related to the type and intensity of interven- 
tions necessary to control an epidemic since the objective is to make R{t) < 1 as soon 
as possible. To achieve R{t) < 1, one or a combination of control strategies may be 
implemented. For example, one of the best known uses of Rq is in determining the 
critical coverage of immunization required to eradicate a disease in a randomly mixing 
population. That is, when vaccine is available against a disease in question, it is of 
interest to estimate the critical proportion of the population that needs to be vacci- 
nated {i.e. vaccination coverage) in order to attain R < 1 [18] [I9]. For example, in 
the U.S prior to 1963, a vaccine against measles was not available and hence recurrent 
epidemics of measles were observed with approximately 3 — 4 million cases per year 
and a mean of 450 deaths. The introduction of the vaccine in the U.S. reduced the 
incidence by 98 percent. 

The critical vaccination coverage, pc (in a randomly mixing population) can be es- 
timated from the Rq of the disease in question as follows [20j: 
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Pc > 




Ro 



1 



) 



(1) 



where e is the efficacy {i.e. direct effectiveness) of vaccination [21j. pc given in ([T]) 
suggests that the disease could be eradicated even when all susceptible individuals are 
not vaccinated. The protection conferred to the population by achieving a critical vac- 
cination coverage is known as herd immunity [22] [23] . 

A brief history of the theoretical developments on the basic reproduction number 
and its analytical computation via epidemic modeling is given elsewhere [23] |24j [25] . 
The mathematical definition and calculation of Rq using next-generation arguments 
is described initially by Odo Diekmann and his colleagues [17] [26], where Rq is the 
dominant eigenvalue of the resulting next generation matrix. Further elaborations and 
reviews can be found elsewhere [27] [2H] [29] [30] [3l] [32] [33] . Classically, rather than the 
threshold phenomena, Rq was used to suggest the severity of an epidemic, because the 
proportion of those experiencing infection at the end of an epidemic depends only on 
Rq [31] (see Section 3). 

Statistical methods to quantitatively estimate Rq have been reviewed by Klaus Di- 
etz [35]. Depending on the characteristics of data and underlying assumptions of the 
models, Rq can be estimated using various different approaches [36] . In addition to the 
final size equation, Rq of an epidemic of newly emerging disease can be estimated from 
the intrinsic growth rate [4] [6] [8] [18] [37] [38] , which is also referred to as the rate of 
natural increase, suggesting the natural growth rate of infected individuals in a fully 
susceptible population (discussed in Section 4). Moreover, for simple epidemic models 
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with relatively few parameters, can be estimated with other unobservable quantities 
by rigorous curve fitting of model equations to the observed epidemic data (discussed 
in Section 3) [38] [39] [IQ] . Not only i?o but also R{t) can be estimated from the temporal 
distribution of infectious diseases, reconstructing the transmission network or inferring 
the time-inhomogeneous number of secondary transmissions [H] [12] [l3] [H] . 

To estimate the basic reproduction number of endemic diseases, different approaches 
are taken. One would need first to carry out serological surveys to quantify the frac- 
tion of the population that is effectively protected against infection {i.e. age- and/or 
time-specific proportion of those possessing acquired immunity needs to be estimated). 
Through this effort, the force of infection, the rate at which susceptible individuals 
get infected, is estimated [15]. For example, this is the case for rubella, mumps and 
measles (that are still circulating in some regions of the world even when high effective 
vaccination coverage is achieved). Although the estimation of Rq for endemic diseases 
is out of the scope of this review, methodological details and the applications to esti- 
mate the force of infection and Ro can be found elsewhere [16] [17] [IE] [19] [50] [51] [52] . 

In practice, the reproduction number denoted simply by R and defined as the num- 
ber of secondary cases generated by a primary infectious cases in a partially protected 
population might be useful. R can also be estimated from the initial growth phase of an 
epidemic in such a partially immunized population. In a randomly mixing population, 
the relationship between the basic reproduction number (Rq) and the reproduction 
number (R) is given by i? = (1 —p)Ro where p is the proportion of the population that 
is effectively protected against infection (in the beginning of an epidemic). Besides, for 
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many recurrent infectious diseases including seasonal influenza, estimating the back- 
ground immunity p in the population is extremely diflicult due to cross- immunity of 
antigenically-related influenza strains and vaccination campaigns. 

With reagard to seasonal influenza, the reproduction number (i?) over the last 3 
decades has been estimated at 1.3 (SE 0.05) in the United States, France, and Australia 
with an overall range of 0.9 — 2.0 [53j. An R estimate of 1.5 has been reported for a 
single A/H3N2 season in France [51j, and some estimates have been reported in the 
range 1.4-2.6 for several consecutive influenza seasons in England and Wales |55j|56j . 
A particularly high estimate of R has been suggested for the 1951 influenza epidemic 
in England and Canada [57] . 

Because influenza pandemics such as the Spanish flu from 1918-19 are associated to 
the emergence of novel influenza strains to which most of the population is susceptible, 
it might be reasonable to assume that the reproduction number R Rq. Previous 
studies have estimated that Rq of the 1918-19 influenza pandemic ranged between 1.5 
and 5.4 [H] [3H] [10] [39] [57] [M] [59] [60] [61] [62^ depending on the speciflc location 

and pandemic wave considered, type of data, estimation method, and level of spatial 
aggregation, which has ranged from small towns to entire nations with several million 
inhabitants. Table [T] lists estimates of Rq in recent studies. The variability of Rq es- 
timates suggests that local factors, including geographic and demographic conditions, 
could play an important role in disease spread [65] [66] [67] . In the following sections, 
we review how these estimates are obtained and how we shall interpret the estimates, 
starting from a simple structured epidemic model proposed in 1927. 
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3 Estimating Rq using a structured epidemic model 

Mathematical models provide a unique way to analyze the transmission dynamics and 
study various different scenarios associated to the spread of communicable diseases 
in population(s) [18] [68] [69] . The history of the mathematical modeling of infectious 
diseases greatly remounts to the study of Sir Ronald Ross in 1911 [70] who invented 
a classic malaria model and also discovered the mosquito-borne transmission mecha- 
nisms of malaria. Employing a mass action principle for the spread of malaria, Ross 
explored the effects of controling the mosquito population using simple mathematical 
models [7T]. Following his effort, Kermack and Mckendrick introduced a classical SIR 
(susceptible-infectious-removed) epidemic model in 1927, which is most frequently uti- 
lized in the present day, given by the following system of nonlinear ordinary differential 
equations (ODEs) 



dS{t) _ j3S{t)I{t) 

ll{t) (2) 



dt N 
dl{t) _ l3S{t)I{t) 

dt ~ N 



dt 

where S{t) denotes susceptible individuals at time t\ I(t), infected (assumed infectious) 
individuals at time t; and R(t), recovered (assumed permanently immune) individuals 
at time t; (3 is the transmission rate; 7 the recovery rate; and is the total popula- 
tion size which is assumed constant for a closed population {i.e. a population without 
immigration and emmigration). Susceptible individuals in contact with the virus en- 
ter the infectious class (J) at the rate j3I /N . That is, homogeneous mixing between 
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individuals is assumed. 

The basic reproduction number, i?o, for the epidemic system ([2]) is given by the 
product of the transmission rate and the mean infectious period. That is: 

Ro = - (3) 

7 



Classically, Rq has been known as a quantity to suggest severity of an epidemic 
Indeed, analytical expression of Rq in ([3]) is derived simply by solving the above system 
([2]). Replacing I{t) in the right hand side of dS(t)/dt by {l/'j)dR(t)/dt, we get 

1 dS{t) (3 dR{t) 



S{t) dt jN dt 

Integrating both sizes of (jl]) from to infinity. 



(4) 



Since S{oo) = N — R{oo), and because we assume S{0) = N and -R(O) = 0, equation 
([5]) can be rewritten as 

N-R{oo) ^ _PR{oo)_ 

In the above equation ([6]), final size, i.e., the proportion of those experiencing infection 
among a total number of individuals in a community following a large scale epidemic, 
is defined as p := R{oo)/N. That is, 

In(l-p) = -i?oP (7) 

Therefore, the following final size equation of an autonomous SIR (or SEIR) model 
is obtained: 

iio = (8) 

P 
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Equation can be analytically derived using both deterministic (models governed 
by ODEs or partial differential equations (PDEs))[73j and stochastic models 

Despite the usefulness of ([2]), SIR assumptions given by ODEs are not always di- 
rectly applicable to real data. One of the reasons include that there is no disease where 
an infected individual can cause secondary transmission immediately after his/her in- 
fection. 

Accordingly, we have used slightly extended compartmental models in the previ- 
ous studies to describe the transmission dynamics of the 1918-19 influenza pandemic 
and estimate the reproduction number [38] [39]. We now describe two different SEIR 
(susceptible-exposed-infectious-removed) models that have been used to estimate the 
reproduction number. The first model is the simple SEIR model, and the second model 
accounts for asymptomatic and hospitalized individuals. 

The simple SEIR model classifies individuals as susceptible (S), exposed (E), infec- 
tious (I), recovered (R), and dead (D) [TS]. Susceptible individuals in contact with the 
virus enter the exposed class at the rate pi(t)/N, where j3 is the transmission rate, 
I(t) is the number of infectious individuals at time t and = S(t) + E{t) +I(t) +R(t) 
is the total population for any t. The entire population is assumed to be susceptible 
at the beginning of the epidemic. Individuals in latent period (E) progress to the 
infectious class at the rate k (where 1/k suggests the mean latent period). We as- 
sume homogeneous mixing {i.e. random mixing) between individuals and, therefore, 
the fraction I{t)/N is the probability of a random contact with an infectious individ- 
ual in a population of size A^. Since we assume that the time-scale of the epidemic 
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is much faster than characteristic times for demographic processes (natural birth and 
death), background demographic processes are not included in the model. Infectious 
individuals either recover or die from influenza at the mean rates 7 and 6, respectively. 
Recovered individuals are assumed protected for the duration of the outbreak. The 
mortality rate is given by 5 = 7 [CFP/ (1-CFP)], where CFP is the mean case fatality 
proportion. The transmission process can be modeled using the system of nonlinear 
differential equations: 

' dS{t) f3S{t)I{t) 



dt 
dE{t) 

dt 
dl{t) 

dt 
dR{t) 

dt 
dD{t) 

dt 
dC{t) 

dt 



N 



kE{t) 



Nit) 

kE{t) - (7 + 5)I{t) 

51{t) 
kEit) 



(9) 



where C{t) is the cumulative number of infectious individuals. The basic reproduction 
number of the above system ^ is given by the product of the mean transmission rate 
and the mean infectious period, Rq = l3/{^ + 5). 

A more complex SEIR model (Figured]) classifies individuals as susceptible (S'), 
exposed (i?), clinically ill and infectious (J), asymptomatic and partially infectious 
(A), diagnosed and reported (J), recovered (i?), and death [D). The birth and natural 
death rates are assumed to have a common rate // (60-year life expectancy as in [38]). 
The entire population is assumed susceptible at the beginning of the pandemic wave. 
Susceptible individuals in contact with the virus progress to the latent class at the 
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rate l3{I{t) + J{t) + qA{t))/N where {3 is the transmission rate, and < g < 1 is a 
reduction factor in the transmissibihty of the asymptomatic class {A). Since there is 
no exphcit evidence estimating and proving the effectiveness of pubhc health interven- 
tions, and because a high burden was placed upon the sanitary and medical sectors, 
diagnosed/hospitalized individuals (J) are assumed equally infectious. Although it is 
difficult to explicitly evaluate the difference in infectiousness between general commu- 
nity and hospital, we roughly made this assumption since 78 percent of the nurses of the 
San Francisco Hospital contracted influenza [75] . A more rigorous assumption requires 
either statistical analysis of more detailed time-series data [76j or an epidemiological 
comparison of specific groups by contact frequency [7^. The total population size at 
time t is given by = S{t) + E{t)+I{t)+A{t) + J{t)+R{t). We assumed homogeneous 
mixing of the population and, therefore, the fraction (/(t) + J{t) + qA{t))/N is the 
probability of a random contact with an infectious individual. A proportion < p < 1 
of latent individuals progress to the clinically infectious class (/) at the rate k while 
the remaining (1 — p) progress to the asymptomatic partially infectious class {A) at the 
same rate k (fixed to 1/1.9 days~^ [8]). Asymptomatic cases progress to the recovered 
class at the rate 71. Clinically infectious individuals (class /) are diagnosed (reported) 
at the rate a or recover without being diagnosed (e.g., mild infections, hospital refusals) 
at the rate 71. Diagnosed individuals recover at the rate 72 = 1/(1/71 — l/a) or die 

at rate 5. The mortality rates were adjusted according to the case fatality proportion 

C FP 

(CFP), such that 6 = ^_^pp if^ + 72). 

The transmission process can be modeled using the following system of nonlinear 
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differential equations: 




' dS{t) 




dt 




dE{t) 




dt 




dA{t) 




dt 




dl{t) 


< 


dt 




dJ{t) 




dt 




dR{t) 




dt 




dD{t) 




dt 




dC{t) 




^ dt 



- {k + ^i)E{t) 



(10) 



: k{l-p)E-{^, + ^i)A{t) 
-- kpE{t) - {a + -fi + fi)I{t) 
-- aJ(t) - (72 + 5 + /i)J(t) 

: ^Mit)+Iit))+^2J{t)-fiRit) 

-- 6J{t) 
-- al{t) 

We assume the cumulative number of influenza notifications, our observed epidemic 
data, is given by C(t). Seven model parameters 71, a, g, p, -E(O), /(O)) are estimated 
from the epidemic curve by least squares fitting as explained below. The reproduction 
number for model (ITOl) is given by (see [38]): 



n 



(3k 



P 



+ 



a 



k + ix\ V7i + Q; + /i (71 + a + yu)(72 + 5 + /i) 
and the clinical reporting proportion is given by: 



;i-p) 



11+ p 



:iii 



o 



a 



a + 7i + 



(12) 



3.1 Parameter estimation 

In the simplest manner, model parameters can be estimated via least-square fitting of 
the model solution to the observed data. That is, one looks for the set of parameters 
whose model solution best fits the epidemic data by minimizing the sum of the squared 
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differences between the observed data yt and the model solution C(t, 0). That is, we 
minimize: 

n 

^(0) = E(yt-c(t,0))' (13) 
t=i 

The standard deviation of the parameters can be estimated by computing the asymp- 
totic variance-covariance AV{Q) matrix of the least-squares estimate by j78| : 

Av(e) = a2(v@c(eo) VeC(eo)^)-^ (14) 

which can be estimated by 

a2(v0C(e)v©c(0)^)-^ (15) 

where n is the total number of observations, o"^ is the estimated variance, and VC 
are numerical derivatives of C . Estimates of Rq can be obtained by substituting the 
corresponding individual parameter estimates into an analytical formula of Rq. Fur- 
ther, using the delta method [79], we can derive an expression for the variance of the 
estimated basic reproduction number Rq. An expression for the variance of Rq for the 
simple SEIR model (Equations [9]) is given by: 



(7 + (5)2 (i- + if- 
2 .,(c„.<^J)_^^M47)+c-o„(*.,3))}. (16) 



/3(7 + 5) 7 + 5 

This expression depends on the variance (denoted by V) of the individual parameter 
estimates as well as their covariance (denoted by Cov). 
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3.2 Bootstrap confidence intervals 

Another method to generate uncertainty bounds on the reproduction number is gen- 
erating bootstrap confidence intervals by generating sets of reahzations of the best-fit 
curve C{t) [80j. Each reahzation of the cumulative number of case notifications Ci{t) 
{i = 1, 2, . . ., m) is generated as follows: for each observation C{t) for t = 2, 3, . . ., n 
days generate a new observation C[{t) for t > 2 (Cj'(l) = C(l)) that is sampled from a 
Poisson distribution with mean: C{t) — C{t—1) (the daily increment in C{t) from day 
t — 1 to day t). The corresponding realization of the cumulative number of infiuenza 
notifications is given by Ci{t) = X]j=i ^ii^) where t = 1, 2, 3, . . ., n. The reproduction 
number was then estimated from each of 1000 simulated epidemic curves to generate 
a distribution of R estimates from which simple statistics can be computed including 
95% confidence intervals. These statistics need to be interpreted with caution. For ex- 
ample, 95% confidence intervals for R derived from our bootstrap sample of R should 
be interpreted as containing 95% of future estimates when the same assumptions are 
made and the only noise source is observation error. It is tempting but incorrect to 
interpret these confidence intervals as containing the true parameters with probability 
0.95. Figure [2] shows the temporal distributions of the reproduction number and the 
proportion of the clinical reporting obtained by the bootstrap method after fitting the 
complex SEIR epidemic model to the initial phase of the Fall influenza wave using 17 
epidemic days of the Spanish Flu Pandemic in San Francisco, California. 
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4 Primer of mathematics and statistics of Rq and 

R{t) 

In addition to the estimation of Rq, it is of practical importance to evaluate time- 
dependent variations in the transmission potential. Explanation of the time course of 
an epidemic can be partly achieved by estimating the effective reproduction number, 
R{t), defined as the actual average number of secondary cases per primary case at 
time t (for t > 0) [41] p] [ii] [81] [sj 



Although effective interventions against Spanish 
influenza may have been limited in the early 20th century, it is plausible that the contact 
frequency leading to infection varied with time owing to the huge number of deaths 
and dissemination of information through local media {e.g. newspapers). R{t) shows 
time-dependent variation with a decline in susceptible individuals (intrinsic factors) 
and with the implementation of control measures (extrinsic factors). If R{t) < 1, it 
suggests that the epidemic is in decline and may be regarded as being under control at 
time t (vice versa, if R{t) > 1). 

4.1 Estimation of Rq using the intrinsic growth rate 

To appropriately understand the theoretical concept of R(t), let us firstly consider 
an explicitly infection-age structured epidemic model. Whereas Kermack-McKendrick 
model governed by ODEs (i.e. SIR and SEIR models as discussed above) has been well- 
known, Kermack and McKendrick had actually proposed an infection-age structured 



^R{t) should not be confused with the number of removed individuals using the same notation. In 
the following arguments of this paper, R{t) denotes the effective reproduction number. 
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model in their initial publication in 1927 [72], the mathematical importance of which 
was recognized only after 1970s [83] [Hj. Let us denote the numbers of susceptible and 
recovered individuals by S{t) and U{t). Further, let i{t,T) be the density of infectious 
individuals at time t and infection-age r {i.e. time since infection). The model is 
given by 

— + — )2(t,r) = -7(r)«(t,r) 

^ ^' (17) 



dt dr^ 

i{t,0) = X{t)S{t) 

dU{t) 



where \{t) is referred to as the force of infection (foi) {i.e. as discussed in Section 2, 
foi is defined as the rate at which susceptible individuals get infected) which is given 
by: 

X{t) = / /3{T)i{t,T)dT (18) 

Jo 

and 7(r) is the rate of recovery at infection- age r. It should be noted that the above 
model has not taken into account the background host demography {i.e. birth and 
death). In a closed population, the total population size is thus given by: 



oo 



N = S{t)+ / i{t,T)dT + U{t) (19) 
Jo 

It should also be noted that, although i{t,T) is referred to as density, it is not meant 
to be a normalized density {i.e. integral of i{t,T) over t and r does not sum up to 
1). Rather, we use density to mathematically refer to the absolute frequency in the 
infection-age space. 
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The system f ll7l) can be reasonably integrated 



i{t,T) 



T{r)j{t-T 



for t - T > 
jo(r-t), for T-t>0 



(20) 



(21) 



where 

Jit) = i{t,0) 

r(r) = exp {- -f{a) da) 
and joir) suggests the density of initially infected individuals at the beginning of an 

epidemic. In the following arguments, we call j{t) as incidence of infection [i.e. new 

infections at a given point of time t). It is not difficult to derive 



Sit) = 3(0)- / jia)da 



(22) 



from (|T71) - (EH). Thus, the subequation of i(t,0) in system ( ITTIl is rewritten as 



Jit) = m 



3(0)- I j{a)da 





(23) 



Taking into account the initial condition in fl2U]) . equation (1221) is rewritten as 



3{t) 



where 



5(0)- / j{a)da G{t)+ / i;{T)j{t-T)dT 
J L "'0 



V^(r) = /?(r)r(r) 

Git) = r/3(a + t)^^^7^Jo(a)rfa 



(24) 



(25) 



Considering the initial invasion phase (z.e. initial growth phase of an epidemic), we 
get a hnearized equation 



j(t)=5(0)G(t) + 5(0) / ,piT)jit-T)dT 

Jo 



(26) 
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The equation represents Lotka's integral equation, where the basic reproduction 
number, Rq, is given by 

POO 

Ro = 5(0) / ^(r) dr (27) 



Thus, the epidemic will grow if i?o > 1 and decline to extinction if i?o < 1- The above 
model can yield the same final size equation as seen in models governed by ODEs [T7|. 

Assuming that the infection-age distribution is stable, we get a simplified renewal 
equation 

roo 

j{t)= / AiT)jit-T)dT (28) 

Jo 

where A{t) is the product of iP{t) and S{0). Moreover, assuming that we observe an 
exponential growth of incidence during the initial phase (i.e. j{t) = kexp{rt) where k 
and r are, respectively, a constant {k > 0) and the intrinsic growth rate), the following 
relationship must be met 

j{t) =j{t-T)exp{rT) (29) 
Replacing j(t — r) in the right hand side of (l28l) by (1291) . we get 

POO 

Jit) = / A{T)j{t)exp{-rr)dT (30) 
Removing j{t) from both sides of fl30|) . we get the Lotka-Euler characteristic equation: 

e^''M(r),dr (31) 



Further, we consider a probability density of the generation time {i.e. the time from 
infection of an individual to the infection of a secondary case by that individual |85j). 
denoted by w{t): 

<r) := I \ = (32) 
Jq A[x)dx Rq 
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Using (1521) . the equation (IHTj) can be replaced by 

1 f°° 

— = / exp(— rr)w(r), (ir (33) 
-f^o Jo 

The equations (12^ - (15^ are what Wallinga and Lipsitch discussed in a recent study 
[6], reasonably suggesting the relationship between the generation time and Rq. Ac- 
cordingly, the estimator of Rq using the intrinsic growth rate is given by: 

^ = jih- 

where M(— r) is the moment generating function of the generation time distribution 
w{t), given the intrinsic growth rate r [6j q Equation ( IMl) significantly improved the 
issue of estimating Rq using the intrinsic growth rate alone, because permits val- 
idating estimates of Rq by various different distributional assumptions for w{t). The 
issue of assuming explicit distributions for latent and infectious periods has been high- 
lighted in recent studies [86] [87] [88] [89] , [90] and indeed, this point is in part addressed 
by (l34l) . because the convolution of latent and infectious periods yields w{t). Moreover, 
since the assumed lengths of generation time most likely yielded different estimates of 
Rq for Spanish influenza by different studies [60j, equation highlights a critical 



^In the original study of Wallinga and Lipsitch [6], the notation Rq is not used for equation (p4|) 
and rather document (|34l) as the estimator of R. Most likely, there are two reasons for this. First, 
we cannot assure if all individuals are susceptible to pandemic influenza before the start of epidemic 
(as discussed in Section 2). Second, we cannot assume that infection-age distribution is stable during 
the initial growth phase, which is highlighted in (|20p . Thus, it should be remembered that the above 
discussion is mathematically tight in theory, but there are certain number of assumptions to apply the 
concept to observed data. Since writing R alone is always confusing (as it is unclear if R is concerned 
with time or immunity status), here we use Rq instead. 
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need to clarify the generation time distribution using observed data. 

Here we briefly show a numerical example. Figure [3] shows the daily number of 
influenza deaths during Spanish influenza pandemic in a suburb of Zurich, 1918 |91j . 
Since the non-linear phase is difficult to analyze, our interest to estimate Rq with this 
dataset is limited to the initial growth phase only (right panel in Fig [3]). Even though 
the data represent deaths over time {i.e. not infection events with time), we can di- 
rectly extract the same intrinsic growth rate as practised with onset data, assuming 
that death data are a good proxy for morbidity data (see our discussions in Section 6). 
Assuming exponential growth in deaths as shown in ( l29l) . the intrinsic growth rate r is 
estimated to be 0.16 per day. Supposing that w{t) is arbitrarily assumed to follow a 
gamma distribution with mean G and coefficient of variation, CV = ^JVar{G)/G, Rq 
is given by 

Ro = {1 + rG{CVf)T^ (35) 

Although there is no concensus regarding the generation time of Spanish influenza, we 
assume it ranges from 2-5 days. Assuming further that CV = 0.5, Rq is estimated to 
range from 1.36 (for G = 2 day) to 2.07 (for G = 5 days). 

4.2 The concept of R{t) and its estimation 

In the following, let us consider the non-linear phase of an epidemic. Derivation of Rq 
given by flMl) assumes an exponential growth which is applicable only during the very 
initial phase of an epidemic (or, when the transmission is stationary over time), and 
thus, it is of practical importance to widen the utility of above-described renewal equa- 
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tions in order to appropriately interpret the time-course of an influenza pandemic. Let 
us explicitly account for the depletion of susceptible individuals, as we deal with an es- 
timation issue with time-inhomogeneous assumptions (i.e. non-linear phase) . Adopting 
the mass action assumption, we get: 

/•oo 

Jit) = Sit) / ijir)jit-T)dT 



(36) 

/•oo ^ ' 

= / A{t,T)jit-T)dr 
Jo 

where A{t, r) should be interpreted as the reproductive power at time t and infection- 
age r at which an infected individual generates secondary cases. We refer to the latter 
part of equation ( |36l) as a non-autonomous renewal equation, where the number of new 
infection at time t is proportional to the number of infectious individuals (as assumed 
in the renewal equation in the initial phase). 

Using equation ( 136|) . the effective reproduction number, R(t) {i.e. instantaneous 
reproduction number at calendar time t) is defined as: 

POO 

R{t) = / A{t,T)dT (37) 



Jo 

Following (|371) . we can immediately see that R{t) with an autonomous assumption {i.e. 
where contact and recovery rates do not vary with time) is given by: 

Rit) = j^^Ro (38) 

which is shown in [TTj. In practical terms, equation ( l38i) suggests that time- varying de- 
crease in transmission potential as well as decline in the epidemic reflects only depletion 
of susceptible individuals. This corresponds to a classic assumption of the Kermack 
and McKendrick model. 
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However, as we discussed in the beginning of this section, we postulate that human 
contact behaviors (and other extrinsic factors) modifies the dynamics of pandemic 
influenza, assuming that the decline in incidence does reflect not only depletion of 
susceptibles but also various extrinsic dynamics {e.g. isolation, quarantine and clo- 
sure of pubhc buildings). Thus, instead of the assumption in (!36|) . we shall assume 
time-inhomogeneous r); i.e. 

POO 

j{t) = Sit) ^{t,T)jit-T)dT 



(39) 

A{t,T)j{t-T) dr 

Jo 

to describe A{t, r). 

To derive simple estimator of R{t), it is convenient to assume separation of vari- 
ables for A{t, t) (implicitly assuming that the relative infectiousness to infection-age is 
independent of calendar time) ^92j. Under this assumption, A(t,T) is rewritten as the 
product of two functions (pi{t) and 02 (t): 

Ait,T) = Mt)Mr) (40) 
Arbitrarily assuming a normalized density for 02 (''"); ^-c, 

b2{r) dr = 1 (41) 

Jo 

then, it is easy to find that 

POD 

R(t) = / A{t,T)dT = 0i(t) (42) 



oo 



suggesting that the function 0i(t) is equivalent to the effective reproduction number 
R{t). Another function 02 (t) represents the density of infection events as a function 
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of infection-age r. Accordingly, we can immediately see that 02 (t) is exactly the same 
as wlr), the generation time distribution. That is, the above arguments suggest that 
A{t, r) {i.e. the rate at which an infectious individual at calendar time t and infection- 
age T produces secondary transmission) is decomposed as: 

A{t,T) = R{t)w{T) (43) 

Inserting (H3|) into (139|) yields an estimator of R(t) [92] : 

^^^^ ■^"(^) ^^^^ 



The above equation (l44l) is exactly what was proposed in applications to SARS [41] 
and foot and mouth disease [93]; i.e. discretizing (jSj) to apply it to the daily incidence 
data {i.e. using ji incident cases infected between time and time tj+i and descretized 
generation time distribution Wi), 

R{U) = (45) 

was used as the estimator. However, it should be noted that the study in SARS 
implicitly assumed that onset data c{t) at time t reflects the above discussed infection 
event j{t). That is, supposing that we observed Cj onset cases reported between ti and 
tj+i, R{t) was calculated as 

R{U) = =^ (46) 

where Sj is the discretized serial interval which is defined as the time from onset of a 



primary case to onset of the secondary cases [94] [95]. The method permits reasonable 
transformation of an epidemic curve {i.e. temporal distribution of case onset) to the 
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(4^ 



P{k,l)P{k,m) 



estimates of time-inhomogeneous reproduction number R{t). Employing the relative 
likelihood of case k infected by case I using the density function of serial interval 
i.e., 

s{h~ti\e) 

''''' = Er.,ksitk-Ue) ^''^ 

Using pri) . expected value and variance of R(ti) are given by the following 

^ l:ti=t k=l 
1 ""'^ / 

* k=l \l:ti=t l,m:ti=t„ 

where rit is the total number of reported case onsets at time t 

In the present day, only by using the above described methods (or similar concepts 
with similar assumptions), we can transform epidemic curves into R{t) and roughly 
assess the impact of control measures on an epidemic. However, whereas the equations 
(H5|) and fH6l) are similar in theory, we need to explicitly account for the difference be- 
tween onset and infection event. In fact, when there are many asymptomatic infections 
and asymptomatic secondary transmissions, serial interval is not equivalent to the gen- 
eration time, and thus, directly adopting the above methods would be inappropriate. 
Since this point is particularly important in analyzing influenza data, we discuss this 
issue in Section 6. 
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5 Statistical methods to estimate Rq 
5.1 Branching process 

In the previous sections, we discussed several different methods to estimate Rq either by 

(i) employing detailed curve fitting method assuming a structured epidemic model or 

(ii) using the intrinsic growth rate (or doubling time [97] [98]). Summarizing the above 
discussions, we believe that the readers should benefit from memorizing Rq = 1/M(— r) 
for the use of the intrinsic growth rate r in estimating Rq |6j and remembering the final 
size equation Rq = — ln(l— suggesting the severity of an epidemic as the theoretical 
concept. Indeed, estimator using either the intrinsic growth rate or final size has still 
continued to play an important role in discussing Rq of pandemic influenza |63] . 




However, it should be noted that deterministic models do not permit incorporating 
stochasticity explicitly {e.g. standard error for Rq is determined by measurement of 
errors alone), as the models argue only average number of secondary transmissions 
within the assumed transmission dynamics. That is, our arguments given above explore 
only the time-evolution of influenza spread in the mean field. To address the issue of 
variation in secondary transmissions, full stochastic models are called for [99] . 

From a viewpoint of data science, the discrete-time branching process, which is also 
referred to as Galton- Watson process, can reasonably assess individual heterogeneity 
in secondary transmissions [100] [101] . As we discussed the initial growth phase of 
the deterministic model, let us consider the same epidemic phase where we observe a 
geometric increase in the number of cases by generation [23]. We denote the initial 
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number of infected individuals by Cq in generation 0. Then, during the first generation, 
Ci cases are produced by secondary transmissions of Cq. Similarly, let C„ be the 
number of infections in generation n. The branching process of this type assumes that 
every infected individual has an independently and identically distributed stochastic 
random variable p-"^ representing the number of secondary cases produced by case i in 
generation (n), and that environmental stochasticity and immigration/emigration can 
be ignored. Supposing that the pattern of secondary transmission follows a discrete 
probability distribution with k secondary transmission(s); i.e., 

Pfc = Pr(pf) = A;) (fc = 0,1,2,...) (49) 

then, the expected number of secondary transmissions and the variance are given by 

(50) 

In other words, the concept of probability distribution pk reflects offspring dis- 
tribution in population ecology, and this permits explicit modeling of variations in 
secondary transmissions in infectious diseases [102] [103j . This approach is particularly 
important during the initial phase of an epidemic, because the number of infectious 
individuals is small in this stage, and thus, it is deemed essential to take into account 
demographic stochasticity, i.e., variation in the numbers of secondary transmissions 
by chance. Indeed, the model has been applied to observed outbreak data where we 
observed the extinction before growing to a major epidemic |104] [105] . 

Let us briefly discuss the variation in secondary transmissions and an estimation 
method of Rq using the discrete-time branching process, deriving analytical expres- 
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sions of the expected number of infected individuals in generation n, M„ = E{Cn) and 
the variance Vn = Var{Cn)- It is impossible to avoid using the probability generating 
function (pgf) to discuss the branching process. The above described p-"^ character- 
ize positive and discrete number of secondary transmissions, and thus, is a non-zero 
discrete random variable. The pgf of p, gp{s) is given by 



There are two basic properties concerning g{s) in relation to the epidemic process. 
First, -Ro is by definition the mean value of secondary transmissions (equation ( l50l) ) 
and, thus given by g'{l). Second, the probability that an infected individual does not 
cause any secondary transmissions Pq = Pr(p=0) is given by g{0), which is useful for 
discussing threshold phenomena and extinction jlOlj . If we note that Cq = 1 (i.e. only 
one index case), the Galton- Watson process has the following pgf identity: 

9o{s) = s 

(52) 

gn+i{s) = gn{g{s)) = g{gn{s)) 

Even when there are Cq = a cases in generation (where a > 1), we just have to 
assume that there are a different independent infection-trees and thus 



oo 




(51) 



9co{s) 



(53) 



£/c„(s) 
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From the above discussions, the expected number of cases in generation n, M„, and 
the variance Vn is 



nVar{p), {Rq = 1) (54) 

pn 1 



The process grows geometrically if -Rq > 1, stays constant if -Rq = 1, and decays ge- 
ometrically if -Rq < 1- These three cases are referred to as supercritical, critical, 
and subcritical, respectively. However, unlike the deterministic model, it should be 
remembered that critical process does not permit continued transmissions, and rather, 
the process becomes extinct almost surely (i.e. probability of extinction given Rq = 1 
is one) [TOT] . 

Mathematically, demographic stochasticity in transmission is represented by a Pois- 
son process, which has been practiced in the application of branching processes to epi- 
demics [T7j. Assuming that mean value of secondary transmissions is a constant Rq, 
the conditional distribution of observing Cn+i cases, given C„ cases, follows a Poisson 
distribution: 

Cn+i\Cn ~ Poisson [CnRo] (55) 

Supposing that we analyze influenza data documenting the generations of cases from 
to n in which we observed geometric growth, the likelihood of estimating Rq is 
proportional to 

n 

n {RoCk-if' exp (-i?oCfc_i) (56) 

k=l 
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Here we apply the above model to the Spanish influenza data in Zurich (Figure [3]) • 
Assuming that the generation time of length r, w{t), is given by the following delta 
function with the mean length 3 days, 



w{t) = < 



CXO, for r = 3 

(57) 



0, for r^3 

then the observed series of data can be grouped by generation (Cq, Ci, C2, ■■■): 

1, 3, 4, 7, 26, 30, 37,... (58) 

Since we assumed exponential growth during the initial 16 days in the previous sec- 
tion, here we similarly assume a geometric increase up to the 6th generation. Applying 
( l56l) to the above data, maximum likelihood estimate of Ro (and the corresponding 
95 percent confldence intervals) is 1.51 (1.24, 1.81). The model is simple enough to 
estimate Rq, and indeed, a slight extension of the discrete-time branching process has 
been employed to estimate Rq as well as the proportion of undiagnosed cases in the 
analysis of SARS outbreak data |106] . 

It should be noted that the discrete-time branching process assumes homogeneous 
pattern of spread. A technical issue has arisen on this subject during the SARS out- 
break. Usually, we observe some cases who produce an extraordinary number of sec- 
ondary cases compared with other infected individuals, which are referred to as super- 
spreaders. Because of this, observed offspring distributions for directly transmitted 
diseases tend to be extremely skewed to the right. Empirically, it has been suggested 
that Poisson offspring distribution is sometimes insufficient to highlight the presence 
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of superspreaders in epidemic modeling |lU7j . For example, if non-zero discrete dis- 
tribution of secondary cases follows a geometric distribution with mean Rq, the pgf is 
given by a geometric distribution with mean Rq: 



Moreover, if the offspring distribution follows gamma distribution with mean Rq and 
dispersion parameter k, the pgf g{s) follows negative binomial distribution [108j : 



We still do not know if pandemic influenza is also the case to warrant the skewed 
offspring distributions. To explicitly test if superspreading events frequently exist in 
influenza transmission, it is necessary to accumulate contact tracing data for this dif- 
ficult disease, the cases of which often show flu-like symptoms only (as discussed in 
Section 1). In addition, it should be noted that we cannot attribute the skewed off- 
spring distribution to the underlying contact network only. To date, there are two 
major reasons which can generate superspreaders: (i) those who experience very fre- 
quent contacts (social superspreader) or (ii) those who are suffering from high pathogen 
loads or those who can scatter the pathogen through the air such as the use of nebuliser 
in hospitals (biological superspreader). From the offspring distribution only, we cannot 
distinguish these two mechanisms. 

5.2 Counting process 

With regard to the estimation of Rq using final size, we briefly discuss another method 
based on a stochastic process. As we discussed above, let S{t), I(t) and Uit) be the 




(59) 




(60) 
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numbers of susceptible, infectious and recovered individuals at time t, respectively. 
Further, let (3 and I/7 denote the transmission rate and the mean duration of the 
infectious period, respectively. Supposing that K{t), the number of individuals who 
experienced infection between time and time t, is given by K{t) = S{0) — S{t), 
the two processes K{t) and U{t) are increasing counting processes where the general 
epidemic is explained by: 

Pr{dK{t) = l,dU{t) = 0\Zt) = (3S{t)I{t)dt 

Pr{dK{t) = 0,dU{t) = l\Zt) = -fl{t)dt (61) 

Pr{dK{t) = 0, dU{t) = 0\Zt) = 1 - (3S{t)I{t)dt - -fl{t)dt 
where Zf denotes the cr-algebra generated by the history of the epidemic {S{u), I{u); < u 

and S(t) = S(t)/n (where n is the size of the susceptible population at time 0). The 

latter is equivalent to assuming density-independent transmission [i.e. also referred 

to as true mass action or frequency dependent assumption [109j ). Based on equation 

( 16T]) . two zero-mean martingales jllOj are defined by: 

Mi(t) = K(t) - f' (3S(u)I(u)du 

° (62) 

M2(t) = U{t) - fl^jl{u)du 
From the martingale theory [111] , a zero-mean martingale is given by 

S{u) 7 
n n n j3 



(63) 



^(0) 5(0) - 1 S{t) + 1 7 

Thus, the estimator 6 = P/^ is given by 



Uit) 



e 



n n n 
+ -TTTTT. 7 + ••• + 



S{0) S{0)-1 S{T) + 1 

U{T) ' (64) 

—ln{l — p) 
U{T) 
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where p is the observed final size (= 1 — S{T)/n) at the end of the epidemic at time 
T. Furthermore, the variance of the zero- martingale is given by 



Var{M{t)) = Var{Mi{t)) + eWar{M2{t)) 



(65) 



From the martingale central limit theorem |112] . the estimator 6 is approximately 
normally distributed in a major outbreak in a large community. The standard error is 
then consistently estimated by: 



s.e.{9) 



n 



+ 



n 



+ ... + 



n 



{S{T) + 1) 



e^R{T) 



n n 
r + 



5(0) + 2 5(0) + 2 



f/(T) 
J - e^R{T) 



(66) 



(67) 



U{T) 

Consequently, the estimator and standard error of Rq are given by: 

Rq = n X 6 

s.e.{Ro) = n X s.e.{6) 
More detailed mathematical descriptions can be found elsewhere |74] [113] |114j . 

Here we show a numerical example. Let us consider a large epidemic of equine 
influenza {i.e. influenza in horses) as our case study. In 1971, a nationwide epidemic 
of equine-2 influenza A (H3N8) was observed in Japan |115j . For example, in Niigata 
Racecourse, 580 influenza cases were diagnosed with influenza among a total of 640 
susceptible horses. The flnal size p is thus 90.6 percent (95 percent CI: 88.4, 92.9). 
From this data, we calculate Rq and its uncertainty bounds. 

Using p = 0.906 and total number of infected U{T) = 580 in equation (1641) . 6 is 
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estimated as 0.00408. Therefore, the estimate of Rq = 2.60 is given by equation (1^ . 
Moreover, from equation fl66l) where S(T) = 639 — 579 = 60 and S'(0)(= n) = 639 (we 
assume one case was aheady infected at time t = 0), we obtain s.e.{9) = 0.000126. 
Here, 6 is assumed to follow normal distribution. Therefore, the 95 percent confidence 
interval for Rq is given as [Rq - 1.96 x 640 x 0.000126, Ro + 1.96 x 640 x 0.000126] = 
[2.44,2.76]. 

When applying the final size equation, it should be remembered that (i) we assume 
all individuals are initially susceptible (in the above described model) and (ii) we 
assume P and 7 are independent of time {i.e. constant), and thus, that any extrinsic 
factors should not have influenced the course of the observed epidemic. 

5.3 Epidemics with two levels of mixing 

In the above described models, we always assumed that the pattern of influenza trans- 
mission is homogeneous, which is clearly unrealistic to capture the transmission dy- 
namics of influenza. Since the last century, it has already been understood that the 
transmission dynamics are not sufficiently modeled by assuming homogeneous mixing. 
However, because more detailed data are lacking {e.g. epidemic records of pandemic 
influenza with time, age and space), what we could offer has been mainly to extract 
the intrinsic growth rate from the initial exponential growth, and estimate Rq using 
the estimator based on a model with the homogeneous mixing assumption. 

One line of addressing heterogeneous patterns of transmission using the observed 
data is separating household transmission from community transmission. In other 
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words, it is of practical importance to distinguish between individual and group Rq 
[116] . From the beginning of explicit modeling of influenza |117] |118] . a method to 
separately estimate the transmission parameters has been proposed, which has been 
partly extended in a recent study JW2\ or applied to further old data of pandemic in- 
fluenza |119j . Indeed, an important aspect of this issue was highlighted in a recent 
study which compared estimates of Rq between those having casual and close contacts 
[63] . To estimate key parameters of household and community transmissions of in- 
fluenza, or to simulate realistic patterns of influenza spread, such a consideration is 
fruitful. 

Mathematically elaborating this concept, there are several publications which pro- 
posed the basis of analyzing household transmission data employing stochastic models 
[120] |121]|122j . Moreover, a rigorous study has been made to estimate parameters de- 
termining the intrinsic dynamics {e.g. infectious period) using household transmission 
data with time |123j . 

Future challenges on the estimation of Ro include the application of such theo- 
ries to the observed data with some extension. For example, as we discussed above, 
knowing the generation time would be crucial to elucidate a robust estimate of Rq 
[6] [HH] [92] [Sn] . However, we do not know if the generation time varies between close 
and casual contacts; this should be the case, because, as long as the generation time 
is given by covolution of latent and infectious periods, close contact should lead to 
shorter generation time than casual contact. In future studies, influenza models may 
better to highlight the increasing importance of considering household transmission to 
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estimate the transmission potential using the temporal distribution of infection events. 

6 Characteristics of influenza data 

Except for our approach in Section 3, mathematical arguments given in this paper 
are not particularly special for influenza. In other words, we modelers have employed 
similarly structured models which describe the population dynamics of other directly 
transmitted diseases, and such models are applicable not only for influenza but also 
for many viral diseases including measles, smallpox, chickenpox, rubella and so on 
[18] . However, influenza has many different epidemiologic characteristics compared 
to other childhood viral diseases. For instance, following the previous efforts in in- 
fluenza epidemiology jl24j jl25j [126] and modeling [127j[128j . we should at least note 
the following: 

1. Detailed mechanisms of immunity have yet to be clarified. Since influenza virus 
has an wide antigenic diversity {i.e. unlike other childhood viral diseases, anti- 
genic stimulation is not monoclonal), this complicates our understanding in the 
fraction of immune individuals, cross-protection mechanisms and evolutionary 
dynamics [129] [130] pT] [132] . 

2. Flu-like symptoms are too common, and thus, we cannnot explicitly distinguish 
influenza from other common viral infections without expensive laboratory tests 
for each case. Because of this character, it is difficult to effectively implement 
usual public health measures {e.g. contact tracing and isolation). 
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3. Although exphcit estimates are hmited |133j|134] . a certain fraction of infected 
individuals does not exhibit any symptoms (following infection). This compli- 
cates not only the eradication [135j but also epidemiologic evaluations of vaccines 
and therapeutics [136] . 

4. Looking into the details of the intrinsic dynamics [1] |123 ]. it appears recently 
that the generation time and infectious period are much shorter than what were 
believed previously. Therefore, despite the relatively small Rq estimate, the turn- 
over of a transmission cycle [i.e. speed of growth) is rather quick. The incubation 
period of Spanish influenza is as short as 1.5 days |137j . complicating the imple- 
mentation of quarantine measures [138j . 

Thus, depending on the characteristics of observed data (and the specific purpose of 
modeling), we have to highlight these factors referring to the best available evidence. 
This is one of the most challenging issues in designing public health interventions 
against a potential future pandemic. 

6.1 What is reported in the observed data? 

In addition to the above described issue, we, of course, must remember what the re- 
ported data is. In many studies, the compartment /(t) or relevant class of infectious 
individuals of the SIR (or SEIR) model was fitted to the observed data. Indeed, in the 
majority of previous classic studies, R(t) {i.e. removed class; denoted by U(t) in our 
discussion) of Kermack and McKendrick model was fitted to the data, assuming that 
the removed class highlights observed data as the reported cases no longer produce 
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secondary cases. However, the observed epidemiologic data is actually neither I{t) nor 
U{t). Always, what we get as the temporal distribution reflects case onset or deaths 
with time which is mostly accompanied by some reporting delay. 

We believe this is one of the most challenging issues in epidemic modeling. Ex- 
cept for rare examples in sexually transmitted diseases, infection event is not directly 
observable, and thus, we have to maximize the utility of reported (observed) data, 
exphcitly understanding what the data represents. 

In this case, back-calculation of the infection events is called for. Let c{t) denote 
the number of onsets at time t, this should be modeled by using incidence j(t) and the 
density of the incubation period of length r, /(r): 



^0 

Further, supposing that b(t) is the number of reported cases at time t and the density 
of reporting delay of length a is h{a), observed data is modeled as: 



That is, only by using the observed data b{t) and known information of the reporting 
delay h{s) and incubation period distributions /(r), we can translate the observed data 
into infection process j{t). 

In some cases, only death data with time, d{t), is available [GOj. Similarly, this 
can be modeled using the backcalculation. Let q denote the case fatality of influenza 
which is reasonably assumed time-independent, and further let m(M) be the relative 




(68) 



m 



c(t — s)h{s) ds 



(69) 



/o~ ir^it - s - r)f{T) dTh{s) ds 
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frequency of time from onset to death, d{t) is given by: 

poo 

— 1 c{t — u)m{u) du (70) 
Jo 

Even when using onset data with delay or death data, it should be noted that the 
intrinsic growth rate is identical to that estimated from the infection event distribution. 
Assuming that the incidence j{t) exhibits exponential growth during the initial phase 
of an epidemic, i.e., j(t) = kexp{rt), equations (1681) and (1701) can be rewritten as 
b{t) = kexp{r(t — s — T))f{T) drh^s) ds 

= /cexp(rt) Jp°° exp(— r(s + T))f{T) drh^s) ds 

and 

d(t) = q j (t — u — r) f (r) dTm{u) du 

= q k exp(r(t — u — r))f(T) dTm{u) du {^'^) 

= qk exp(rt) exp(— r(M + T))f{T) drmiu) du 

Thus, the growth terms exp(rt) {i.e. which depends on time) of h{t) and d{t) are still 

identical to that of incidence j{t). In other words, mathematically the equations (ffTl) 

and (1721) could be a justification to extract an estimate of the intrinsic growth rate from 

cases with reporting delay or deaths with time. However, we should always remember 

that the infection-age distribution is not stable during the initial phase, and moreover, 

this method cannot address individual variation in the secondary transmissions {e.g. 

superspreaders, as we discussed in Section 5). 

6.2 What to be learnt from the reported data? 

In this way, it's not an easy task to clarify the infection events with time. A simi- 
lar application of the convolution equation has been intensively studied in modeling 
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HIV/AIDS. Since AIDS has a long incubation period, and because AIDS diagnosis 
is certainly reported in the surveillance data (at least, in industrialized countries), 
backcalculation of the number of HIV infections with time using the nubmer of AIDS 
diagnoses and the incubation period distribution has been an issue to capture the whole 
epidemiologic picture of HIV/AIDS [139] [140] |141j p32] . In the current modeling prac- 
tice using the temporal distribution of onset events, we are now faced with a need to 
apply this technique to diseases with much shorter incubation periods. 

Now, let's look back at a method to estimate R{t), which was proposed by Wallinga 
and Teunis plj. Whereas the method has a background of mathematical reasoning 
(as shown in fH^ . Section 4.2), the estimator was derived implicitly assuming that 
observed data exactly reflects infection events. If asymptomatic infection and transmis- 
sion are rare, this assumption might be justifiable as the lengths of the serial interval 
and generation time become almost identical. However, as long as we cannot ignore 
asymptomatic transmissions, which is particularly the case for influenza, the assump- 
tion s(cr) = w{t) might be problematic [85] . 

Since R{t) of this method was given by summing up the probability of causing 
secondary transmissions by an onset case at the onset time of this case t, we should 
rewrite the assumption using a modified onset-based renewal equation as follows |60j : 



For simplicity, we ignore reporting delay in the observed data, roughly assuming that 
the observed data reflects c(t). Translating equation ( l73l) in words, it is implicitly 
assumed that secondary transmission happens exactly at the time of onset, and based 




(73) 
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on this assumption, R{t) in the right hand side of (175]) can be backcalculated. 

To understand the assumptions behind the above equation, let us assume that 
incidence j{t) is given by 

/•oo 

Jit) = Sit) / f3ia)Tia)cit-a)da (74) 
Jo 

where /3((t) is the transmission rate at disease-age a (z.e. the time since onset of 
infection |143j ) and r(cr) is the survivorship of cases following onset. It should be 
noted that equation (|74l) ignores secondary transmissions before onset of illness. As 
we discussed above, c(t) is given by j(t) and the incubation period distribution /(r), 

POO 

cit)= / jit-T)fir)dT (75) 
Replacing c(t) in the right hand side of fl74l) by fl75l) . we get 

POO 

Jit) = Sit) / jit-s)<Pis)ds (76) 
Jo 

where s represents infection-age (ie. time since infection), and 0(s) is given by 

0(s)= / (3ia)Tia)fis-a)da (77) 



^0 

which represents generation time distribution. From equations (1761) and (177|) . we can 
find that -R(t) is given by 

R(t) = Sit) r (Pis) ds 

(78) 

= Sit)j^Pia)Via)daj^fir)dr 

Equation ( 1781) can be further reduced to -R(t) = R^Sit) / Si^) which represents Kermack 
and McKendrick's assumption. Replacing j(t) in the right hand side of (175|) by (17i|) . 
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we get 

c(t) = / ^{t,a)c{t-a)da (79) 



where ip{t, a) denotes the serial interval distribution of calender time t and disease-age 



a: 



iP{t,a)= P{a-T)V{a-T)f{T)S{t-T)dr (80) 

^0 

Equation (IHOj) is difficult to solve as it includes Sit — r) in the right hand side. However, 
in the special case, e.g., let's say when we can assume /3(r)r(r) = fc5(r) (where k is 
constant and 5(-) is delta function), 

iP{t,a) = kf{a)S{t-a) (81) 

Inserting (EI]) back to ([79]), 

c{t) = kf{a)S{t-a)c{t-a)da 

fU) (82) 

Jo f(^)dr 

which is onset-6ase(i renewal equation which was presented in (1731) . What to be learnt 
from (|82]) is, the assumption that secondary transmission happens immediately after 
onset suggests that the incubation period distribution is identical to the serial interval 
distribution as shown above, which is a bit funny conclusion. Maximizing the utility of 
observed data has still remained an open question. 

In addition to modeling the temporal distribution, explicit modeling of asymp- 
tomatic infection is also called for |135j . Provided that there are so many asymptomatic 
transmissions which are not in the negligible order, we need to shift our concept of 
transmissibility; e.g., rather than Rq, a threshold quantity of symptomatic infection is 
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required. In such a case, application of type-reproduction number T might be useful 
[144] [145] . and it has already been put into practice |146] . 

7 What to be clarified further? 

In this review, we focused on the use of the temporal distribution of influenza to 
estimate Rq (or R{t)) and the relevant key parameters. It must be remembered that 
our arguments, almost necessarily, employed homogeneous mixing assumption, as we 
cannot extract information on heterogeneous patterns of infection from a single stream 
of temporal data alone. Presently, more information {e.g. at least, spatio-temporal 
distribution) is becoming available for influenza. In this section, we briefly sketch what 
can be (and should be) done in the future to quantify the transmission dynamics of 
pandemic influenza. 

7.1 What Rq really means? 

It's not a new issue that heterogeneous patterns of transmission could even destroy the 
mean field theory in infectious diseases. For example, in a pioneering study of gon- 
orrhea transmission dynamics by Hethcote and Yorke |147j . an importance of contact 
heterogeneity was sufficiently highlighted. Since a simple model assuming homogeneous 
mixing did not reflect the patterns of gonorrhea transmission in the United States, Het- 
hcote and Yorke divided the population in question into two; those who are sexually 
very active and not, the former of which was referred to as core group. Compared 
with the temporal distribution of infection given by a model with homogeneous mixing 
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assumption, the simple heterogeneous model with a core group revealed much quicker 
increase in epidemic size, showing rather different trajectory of an epidemic. Given 
that the variance of sexual partnership is extremely large {i.e. if the distribution of 
the frequency of sexual intercourse is extremely skewed to the right with a very long 
right tail), the estimate of Rq is shown to become considerably high. The finding sup- 
ports a vulnerability of our society to the invasion of sexually transmitted diseases. 
Following this pioneering study, considerable efforts have been made to approximately 
model the heterogeneous patterns of transmission using extended mean field equations 



In addition to such an approximation of heterogeneous transmission, recent progress 
in epidemic modeling with explicit contact network structures suggests that vari- 
ance of the contact frequency plays a key role in determining the threshold quantity, 
and in some special cases, the concept of threshold phenomena could be confused 
|150] [151j |152j |153j . In Section 4, we defined the force of infection as 



In deteministic models given by simple ODEs (which ignores infection-age), X{t) is 
equivalent to (31{t). These are what classical mean field models suggest. 

Let us account for an epidemic on networks, whose node-connectivity distribution 
{i.e. the distribution of probabilities that nodes have exactly k neighbors) follows some 
explicit distribution P{k). The force of infection Ac, which yields Rq = 1, in a static 
contact network is given by 



[T8][26] [Ti8][n9] . 





Ac 
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Here {k) denotes the average connectivity of the nodes. Assuming that P{k) follows a 
power law of the form P{k) = ck""" (where c is constant), 



k 

Given that f < 3, Ac = 0, and in such a case, Rq even becomes infinite. This implies 
that the disease spread will continue for any mean estimate of Rq. Such a network 
structure is referred to as scale free |154j . complicating disease control efforts in public 
health [155] [156] . The importance of the network structure would also be highlighted 
for t> > 3. 

For sexually transmitted infections, contact frequency is countable (unlike airborne 
infection or transmission through droplets), and v is estimated to be around 3 or a 
little larger |157] . Following such a finding, many non-sexual directly transmitted 
diseases are also modeled in the present day assuming the scale- free network |150j . 
However, it should be noted that the pattern of contact does not necessarily follow 
scale-free for all directly transmitted diseases. Indeed, there is no empirical evidence 
which suggests that the contact structure of any droplet infections follows the power 
law {i.e. we do not know if the above described contact heterogeneity is the case for 
diseases except for sexually transmitted diseases). A typical example of confusion is 
seen in the superspreading events during the 2002-03 SARS epidemics |158j . where 
we cannot explicitly attribute the phenomena either to contact network or biological 
factors (as long as contact and infection event are not directly observable). We still 
do not know how we should account for the distribution P{k) for influenza and other 
viral respiratory diseases {i.e. power law or not) which remains to be clarified for each 




(85) 
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disease in future research. 



7.2 New theory to replace mass action principle 

Methodoligical developments have been made to account for the network heterogeneity 
with data ^159j . An approximate approach to address this issue is highhghted partic- 
ularly in spatio-temporal modeling, an excellent account of which is reviewed by Matt 
Keeling [T60] . 

Even though it's difficult to quantify the transmission dynamics with an explicit 
contact network with time, there are useful analytical approximations to capture the 
dynamics of influenza (and other respiratory transmitted viral diseases) and estimate 
the transmission potential. For example, the force of infection with a power law ap- 
proximation is reasonably given by: 

A(t) =/?J(t)i+"^(t)i+* (86) 

In (1861) . a and \E' characterize the epidemic dynamics; e.g. initial growth [i.e if a is 
less than 0, the modified form acts to dampen the exponential growth of incidence) 
and endemic equihbrium [i.e. when ^ is greater than 0, density-dependent damping is 
increased). A model of this type was actually validated with measles data in England 
and Wales, comparing the prediction with that of employing the mass action principle 

Another approximation might be a pair- wise model |162j . which can explicitly ac- 
count for the correlation between connected pairs. The model reasonably permits 
deriving the force of infection A using the number of various connected paris, which 
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implies wide applicability to the epidemiologic data of sexually transmitted infections. 
Incorporating spatial heterogeneity in an approximate manner would shed light on fur- 
ther quantifications [163] [164] . and thus, simple and reasonably tractable models which 
permit spatio-temporal modeling of influenza are expected {e.g. [165j ). 

7.3 Which kind of data do we have to explore? 

Summarizing the above discussions, we have presented modeling approaches that can 
quantify the transmission potential of pandemic influenza. As we have shown, temporal 
case distributions have been analyzed in many instances, and previous efforts have come 
close to maximize the utility of temporal distributions [i.e. epidemic curve). However, 
at the same time, we have also learned that we can extract almost the intrinsic growth 
rate alone from a single time-evolution data. Accordingly, we are now faced with a need 
to clarify heterogeneous patterns of transmission and more detailed intrinsic dynamics 
of influenza [166] |167j|168j . With regard to the latter, primitive epidemiologic questions 
{e.g. probability of clinical attack given infection) remain to be answered for Spanish, 
Asian and Hong Kong influenza. Let's summarize what we need to clarify theoretically 
about pandemic influenza in list: 

1. Acquired immunity 

2. Evolutionary dynamics 

3. Multi-host species transmission 

4. Asymptomatic transmission 
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5. Attack rate {i.e. Pr (onset | infection)) 

6. Case fatality {i.e. Pr ( death | onset)) 

7. Generation time and serial interval 

8. Latent, incubation, infectious and symptomatic periods with further data 

9. Transmission potential with time, space and antigenic types 
10. Transmission potential with time and age 

These issues highlight an importance to quantify the transmission of influenza using 
not only cases {i.e. those followed onset of symptoms) but also some hint suggesting the 
infection event. For example, majority of the above listed issues could be reasonably 
addressed by implementing serological surveys {e.g. antibody titers of individuals and, 
preferably, time-delay delay distribution from infection to seroconversion). Since the 
proportion of those who do not experience symptomatic infection {i.e. probability 
of asymptomatic infection) is not small for influenza [6^|146] . case records can tell 
us little to address the above mentioned issues, and thus, historical data of Spanish 
influenza may hardly offer further information. By maximizing the utility of observed 
data, we have to clarify the dynamics of influenza further, and identify key information 
which characterize the specific mechanisms of spread. 
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Table 1: Reported estimates of Rq for pandemic influenza during the fall wave (2nd 
wave) from 1918-19 



Location 


Serial interval 

(days) 


Rq 


Autonomous system 

Oil 1 'J. 1 1 ' 

ntted with entire 
epidemic curve 


Reference 


San Francisco, USA 


6 


3.5 


Yes 


t39j 




6 


2.4 


No 


[39] 


45 Cities m the USA 


go 


2.7 


No 




UK (entire England and Wales) 


6 


1.6 


Yes 


[58] 


Scandinavian cities 


6 


1.4-1.6 


No 


[62] 


Geneva, Switzerland 


5.7 


3.8 


Yes 


[38] 


Sao Paulo, Brazil 


4.6 


2.7 


Yes 


[59] 


cities in Europe and America 


4.0 


1.2-3.0 


No 


[63] 


83 cities in the UK 


3.2, 2.6 


1.7-2.0 


No 


[a [57] 


45 cities in the USA 


2.9 


1.7 


No 


IS] 


RAF camp in the UK 


2.3 


2.9 


No 


m 


Featherston Military 


1.6 


3.1 


Yes 


m 


Camp, New Zealand*^ 


1.1 


1.8 


Yes 


m 




0.9 


1.3 


Yes 


[61] 



"Sensitivity of the R estimates to different assumptions for the serial interval was 
examined; ^Three pandemic waves were simultaneously fitted; '^The epidemic was 
observed in a community with closed contact (i.e. military camp). 
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Figure 1: Flow chart of the state progression of individuals among the dif- 
ferent epidemiological classes as modeled by the complex SEIR model. See 
equations ffTOl) . 
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Figure 2: Model fits, residuals plots, and the resulting distributions of the 
reproduction number and the proportion of the clinical reporting obtained 
after fitting the complex SEIR epidemic model to the initial phase of the 
Fall influenza wave using 17 epidemic days of the Spanish Flu Pandemic in 
San Francisco, California. See equations ffTOl) [39j. In the top panel, the epidemic 
data of the cumulative number of reported influenza cases are the circles, the solid line 
is the model best fit, and the solid blue lines are 1000 realizations of the model fit to 
the data obtained through parametric bootstrapping as explained in the main text. 
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Figure 3: Temporal distribution of Spanish influenza in Zurich. Left panel 
shows an epidemic curve {i.e. deaths distribution) of pandemic influenza in a suburb 
of Zurich in 1918. In total, 259 deaths were observed from 9 July to 18 August. Right 
panel shows observed and expected values of the cumulative number of deaths during 
the first 16 days. The intrinsic growth rate is estimated to be 0.16 per day. Data 
source: [91] 
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